October 2018

This replication package contains data and code to replicate the analyses presented in "Family Matters? Voting Behavior in Households with Criminal Justice Contact" and its online appendix.
Contact Ariel White (arwhi@mit.edu) with questions. 

I have not included the raw court records data underlying this project, out of concern that circulating criminal-history data could eventually make life more difficult for some of the people named in those records (especially if they manage to have their records expunged and my dataset does not reflect that).  Instead, I include the scripts I used to get from the raw data to the merged dataset included here, so others can judge the merging process.  Anyone particularly interested in reproducing that part of the project can request the court records data from the Harris County District Clerk's office (they call it "historical criminal records data").  

There is one main analysis script ("Mainbeforeafter_spring2018.R") and one main de-identified merged dataset ("main_householdsmerged_deidentified.Rdata") that reproduce key analyses from the paper, and most of the results presented can be replicated by loading only that dataset.  I have also included several other merged datasets that allow for the replication of various additional analyses. Several other figures in the depths of the online appendix require yet more data; here I've included replication code for them, with a note to email me if you want the data to run that code. 

Datasets:
- "main_householdsmerged_deidentified.Rdata" - This is the main dataset for the project, and the central analyses can be replicated using only this dataset.
- "households_plusfullfile_deidentified.Rdata" - This is a snapshot of the Harris County voter file (with identifying columns dropped), with indicators for proximal contact. 
- "felonycases_householdsmerged_deidentified.Rdata" - This is a dataset that is equivalent to the main project dataset, except that it results from identifying voters who saw a household member face felony (rather than misdemeanor) charges.

Scripts:
- "Harris_defendantfileprep.R" As noted above, this script cannot be run without acquiring the criminal case records.  This script generates well-formatted geocoded (and still-identifiable) datasets of defendants and voters that will be used in "Mainmerge_spring2018.R". 

- "Mainmerge_spring2018.R" As noted above, this script cannot be run without acquiring the criminal case records.  This script generates the main deidentified datasets "main_householdsmerged_deidentified.Rdata" and "households_plusfullfile_deidentified.Rdata".  It also generates Figure SI.5 (the raw count of all first-time misdemeanor cases over time).  The first half of this code is then repeated, with slight modifications, in "Mainmerge_spring2018_felonyversion.R", in order to generate the de-dentified dataset "felonycases_householdsmerged_deidentified.Rdata" for the felony analysis from the SI. 

- "Mainbeforeafter_spring2018.R" pulls in the merged dataset "main_householdsmerged_deidentified.Rdata" and produces most of the tables and figures in the paper and online appendix. 

- "Fullfileanalysis_spring2018.R" pulls in the merged dataset "households_plusfullfile_deidentified.Rdata" and generates Table 1 (descriptive comparison of households with proximal contact to the full file) and Table SI.1 (simplest OLS with no identification strategy).

-"beforeafter_felonycases_spring2018.R" pulls in the merged dataset "felonycases_householdsmerged_deidentified.Rdata" and generates figure SI.7 (reproducing Figure 2 from the main paper, but focusing on felony rather than misdemeanor cases). 

- "Other_SI_code.R" contains the code that generated SI figures SI11 and SI12, as well as SI20.  This code requires additional data to run (shapefiles, an additional voter file) and so I've included the code here for transparency; anyone interested in running this code locally should send me an email for the necessary data. 
